5 research outputs found

    Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery

    Full text link
    Early-bird communication is a communication/computation overlap technique that combines fine-grained communication with partitioned communication to improve application run-time. Communication is divided among the compute threads such that each individual thread can initiate transmission of its portion of the data as soon as it is complete rather than waiting for all of the threads. However, the benefit of early-bird communication depends on the completion timing of the individual threads. In this paper, we measure and evaluate the potential overlap, the idle time each thread experiences between finishing their computation and the final thread finishing. These measurements help us understand whether a given application could benefit from early-bird communication. We present our technique for gathering this data and evaluate data collected from three proxy applications: MiniFE, MiniMD, and MiniQMC. To characterize the behavior of these workloads, we study the thread timings at both a macro level, i.e., across all threads across all runs of an application, and a micro level, i.e., within a single process of a single run. We observe that these applications exhibit significantly different behavior. While MiniFE and MiniQMC appear to be well-suited for early-bird communication because of their wider thread distribution and more frequent laggard threads, the behavior of MiniMD may limit its ability to leverage early-bird communication

    Improving HPC Communication Library Performance on Modern Architectures

    No full text
    As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), they must leverage increasing levels of parallelism to achieve their performance goals. In addition to increased parallelism, machines of that scale will have strict power limitations placed on them. One direction currently being explored to alleviate those issues are many-core processors such as Intel’s Xeon Phi line. Many-core processors sacrifice clock speed and core complexity, such as out of order pipelining, to increase the number of cores on a die. While this increases floating point throughput, it can reduce the performance of serialized, synchronized, and latency sensitive code paths, such as traditional communication libraries. In this thesis, I examine the impact of many-core processors on large-scale scientific appli- cations and explore ways to improve performance for both future and legacy applications. I examine the effect by characterizing the performance and power tradeoffs for different core frequencies and network hardware. Then, I explore the viability of next-generation programming models by benchmarking the performance of communication libraries utilizing multi-threaded one-sided communication. Next, I improve communication library performance for legacy applications for many-core systems through optimizing the matching algorithm to leverage single instruction multiple data vectors and caching behavior. Finally, I explore two other matching algorithm optimizations targeted at next-generation processors and applications

    Enabling callback-driven runtime introspection via MPI_T

    No full text
    Understanding the behavior of parallel applications that use the Message Passing Interface (MPI) is critical for optimizing communication performance. Performance tools for MPI currently rely on the PMPI Profiling Interface or the MPI Tools Information Interface, MPI_T, for portably collecting information for performance measurement and analysis. While tools using these interfaces have proven to be extremely valuable for performance tuning, these interfaces only provide synchronous information, i.e., when an MPI or an MPI_T function is called. There is currently no option for collecting information about asynchronous events from within the MPI library. In this work we propose a callback-driven interface for event notification from MPI implementations. Our approach is integrated in the existing MPI_T interface and provides a portable API for tools to discover and register for events of interest. We demonstrate the functionality and usability of the interface with a prototype implementation in Open MPI, a small logging tool (MEL) and the measurement infrastructure Score-P

    An organizational approach for the assessment of DNA adduct data in risk assessment: case studies for aflatoxin B 1

    No full text
    corecore